5 research outputs found

    Cost-Quality Trade-Offs in One-Class Active Learning

    Get PDF
    Active learning is a paradigm to involve users in a machine learning process. The core idea of active learning is to ask a user to annotate a specific observation to improve the classification performance. One important application of active learning is detecting outliers, i.e., unusual observations that deviate from the regular ones in a data set. Applying active learning for outlier detection in practice requires to design a system that consists of several components: the data, the classifier that discerns between inliers and outliers, the query strategy that selects the observations for feedback collection, and an oracle, e.g., the human expert that annotates the queries. Each of these components and their interplay influences the classification quality. Naturally, there are cost budgets limiting certain parts of the system, e.g., the number of queries one can ask a human. Thus, to configure efficient active learning systems, one must decide on several trade-offs between costs and quality. The existing literature on active learning systems does not provide an overview nor a formal description of the cost-quality trade-offs of active learning. All this makes the configuration of efficient active learning systems in practice difficult. In this thesis, we study different cost-quality trade-offs that are pivotal for configuring an active learning system for outlier detection. We first provide an overview of the costs of an active learning system. Then, we analyze three important trade-offs and propose ways to model and quantify them. In our first contribution, we study how one can reduce classification training costs by training only on a sample of the data set. We formalize the sampling trade-off between classifier training costs and resulting quality as an optimization problem and propose an efficient algorithm to solve it. Compared to the existing sampling methods in literature, our approach guarantees that a classifier trained on our sample makes the same predictions as if trained on the complete data set. We can therefore reduce the classification training costs without a loss of classification quality. In our second contribution, we investigate how selecting multiple queries allows trading off costs against quality. So-called batch queries reduce classifier training costs because the system only updates the classifier once for each batch. But the annotation of a batch may give redundant information, which reduces the achievable quality with a fixed query budget. We are the first to consider batch queries for outlier detection, a generalization of the more common case to query sequentially. We formalize batch active learning and propose several strategies to construct batches by modeling the expected utility of a batch. In our third contribution, we propose query synthesis for outlier detection. Query synthesis allows to artificially generate queries at any point in the data space without being restricted by a pool of query candidates. We propose a framework to efficiently synthesize queries and develop a novel query strategy to improve the generalization of a classifier beyond a biased data set with active learning. For all contributions, we derive recommendations for the cost-quality trade-offs from formal investigations and empirical studies to facilitate the configuration of robust and efficient active learning systems for outlier detection

    Validating one-class active learning with user studies – A prototype and open challenges

    Get PDF
    Active learning with one-class classifiers involves users in the detection of outliers. The evaluation of one-class active learning typically relies on user feedback that is simulated, based on benchmark data. This is because validations with real users are elaborate. They require the de-sign and implementation of an interactive learning system. But without such a validation, it is unclear whether the value proposition of active learning does materialize when it comes to an actual detection of out-liers. User studies are necessary to find out when users can indeed provide feedback. In this article, we describe important characteristics and pre-requisites of one-class active learning for outlier detection, and how they influence the design of interactive systems. We propose a reference architecture of a one-class active learning system. We then describe design alternatives regarding such a system and discuss conceptual and technical challenges. We conclude with a roadmap towards validating one-class active learning with user studies
    corecore